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Abstract 



Aims. We study the spectral classification of emission-line galaxies as star-forming galaxies or Active Galactic Nuclei (AGNs). From 
the Sloan Digital Sky Survey (SDSS) high quality data, we define an improved classification to be used for high redshift galaxies 
Methods. We classify emission-line galaxies of the SDSS according to the latest standard recipe using [OlIl]A5007, [Nll]A6584, 
[Sll]A6717-l-6731, Ha, and HjS emission lines. We obtain four classes: star-forming galaxies, Seyfert 2, LINERs, and com- 
posites. We then examine where these galaxies fall in the blue diagram used at high redshift (i.e. log([OlIl]A5007/H/3) vs. 
log([Oii]/lA3726-i-3729/Hj6). 

Results. We define new improved boundaries in the blue diagram for star-forming galaxies, Seyfert 2, LINERs, SF/Sy2, and SF- 
LIN/comp classes. We maximize the success rate to 99.7% for the detection of star-forming galaxies, to 86% for the Seyfert 2 
(including the SF/Sy2 region), and to 91% for the LINERs. We also minimize the contamination to 16% in the region of star-forming 
galaxies. We cannot reliably separate composites from star-forming galaxies and LINERs, but we define a SF/LIN/comp region where 
most of them fall (64%). 
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1. Introduction 

Spectral classification of emission-line galaxies at low redshift 
is now routinely done with high quality calibrations. Using a 
set of five strong emission lines - [Olll]A5007, [Nll]A6584, 
[Sii]A6717h-6731, Ha, and Hj3 -, one can reliably distin- 
guish star-forming galaxies, Seyfert 2 galaxies. Low Ionization 
Nuc lear Emission Region (hereafter LINER, see iHeckmanI 
1 1 980) . and composite galaxies with both star-forming regions 
and an active galactic nucleus (hereafter AGN). The equations 
to do such classification have been derived successively by sev- 
eral authors j Saldwin et al. 1981; Veilleux & Osterbrock 1987; 
iKewIev et al]r200II; IKauffmann et al.1120031: IKewIev et aI.II2006i 
among others). 

At redshifts greater than z ~ 0.4, the [Nll]A6584, 
[Sii]A6717h-6731, and Ha emission lines get redshifted out of 
the wavelength range of all major optical spectroscopic surveys. 
Therefore, diagnostic diagrams need to based only on emis- 
sion lines observed in the blue part of the spectra: [Olll]A5007, 
[Oii]AA3726h-3729, and Hj3. Such diagrams, which have been 
used in the past e.g. by Tresse et al. ( 1996) or Rola et al. (1997), 
and which w e call the "blue di agram", have been recently stud- 
ied again by iLamareille et al.l 12004) . The latter have derived, 
from the 2dFGRS data, equations to distinguish star-forming 
galaxies from AGNs. They have also shown that a region ex- 
ists in this diagram where both star-forming galaxies and AGNs 
fall (hereafter the "uncertainty region") and thus cannot be dis- 
tinguished. 



With the high quality Sloan Digital Sky Survey Data Release 
7 (hereafter SDSS DR7) data, it is now possible to revisit the 
blue diagram, and to derive n ew equations more co mpatible with 
the lates t red classification by Kewlev et al.l (l2006h than the ones 
given in Lamareille et al. 1 (12001 7 We derive in particular in this 
paper more precise boundaries between the star-forming and 
AGN regions, new boundaries for the regions where AGNs or 
composites are mixed with star-forming galaxies, and new equa- 
tions to distinguish between Seyfert 2 galaxies and LINERs in 
the blue diagram. 

All spectral classifications, associated numbers, 
and figures presented in this paper have been done 
with the JClassif spectral classification pipeline, 
which is freely available at the following website: 
http : //www . ast . obs-mip . f r/users/f lamare/galaxie/. 
This paper is organized as follow: we recall the current classifi- 
cation scheme in Sect. |2] and apply it to SDSS DR7 data. Then, 
we define our new improved classification for high redshift 
galaxies in Sect. [3] Finally we show an example application our 
of our new classification in Sect. H] 



2. The current classification scheme 

2.1. Data selection and the red classification 

We use SDSS DR7 emission-line measurements of 
868 492 galaxies in the redshift range 0.0 < z < 0.2. 
These data are available online at the following address: 
|http : //www ■ mpa- garching . mpg . de/ SDSS/DR7/| The mea- 
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Figure 2. This is the reference "red" classification of emission-line galaxies at low redshift. The two diagnostic diagrams 
show the relation between two line ratios: log([Olll]A5007/Hj3) vs. log([Nll]A6584/Ha) (left) and log([Olll]A5007/H^) vs. 
log([Sll]AA6717H-6731/Ha) (right). Star-forming galaxies are shown in blue, composites in magenta, Seyfert 2 in green, and 
LINER s in cyan. The red curve s show the empir i cal or theoretical separations: the solid curve (left and r i ght) i s iKewley et al.l 
(1200 Ih . the d otted curve (left) is iKauffmann et alj (l2003h . the horizontal line (left) is IVeilleux & Osterbrocfl (119871) . and the soUd 
line (right) is lKewley etal] (120061) ! 
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Figure 1. Redshift histograms of our data sample. Top: The solid 
line is for the whole sample, the dashed-dotted line is for star- 
forming galaxies. Bottom: The solid line is for composites, the 
dashed line for LINERs, and the dashed-dotted line for Seyfert 
2. 

surements are available for 927 552 different spectra, of which 
109219 spectra are duplicated (twice or more) observations 



of the same galaxy. We have averaged the measurements of 
duplicated spectra in order to increase the signal-to-noise ratio. 
Measurements which do not increase the averaged signal- 
to-noise ratio have been discarded. We select emission-line 
galaxies with the following criterion: the signal-to-noise ratio 
in the equivalent width of the emission lines used in our 
study must be greater than 5. The necessary emission lines to 
derive a spectral classification at low redshift are [Olll]A5007, 
[Nii]A6584, [Sii]A6717h-6731, Ha, and Hj3. We also need the 
[Oii]AA3726h-3729 emission line which will be used to derive 
our new high redshift classification. We end up with 89 379 
emission-line galaxies with the desired minimum signal-to- 
noise ratio. W e sort these galaxies into four classes according to 
iKewlev et al.l (2006) classification scheme. We end up with the 
following numbers: 67778 star-forming galaxies, 2 949 Seyfert 
2, 4912 LINERs, and 13 740 composites. Figure [T] shows the 
redshift histograms of the four classes of emission-line galaxies. 
We find that the targeted population has the same dependence 
on redshift for each of the four classes, with a peak around 
z ^ 0.07, except for the LINERs, whose proportion increases 
at low redshift as compared to the other classes. This possible 
bias has to be noted, even if it does not affect the classification 
derived in this paper that is not primarily based on relative 
proportions between classes. 

Figure |2] shows this classification in the standard BPT 
diagrams. In the left diagram, we remark the difference 
between the old class ificati on of Seyfert 2 and LINERs 
(IVeiUeux & Os terbrock Il987h and the new one defined by 
iKewIev et 31112006 ). The right diagram cannot be used to dis- 
tinguish star-forming galaxies from composites, which fall in 
the same reg ion of this diagram. T his effect has been clearly 
explained bv lStasiriska et al.((l2006l) with photoionization mod- 
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Figures. The "blue" classification of emission-line galax- 
ies at high redshift. The diagnostic diagram show the re- 
lation between two line ratios: log([Olll]A5007/Hj3) vs. 
log([Oll]AA3726H-3729/H)3). According to the red classifica- 
tion (see Fig. |2]l, star-forming galaxies are shown in blue, 
composites in magenta, Seyfert 2 in green, and LINERs in 
cy an. The red curves sho w the empirical separations defined 
bv ILamareille et aEI (|2004|) : the solid curve is the separation be- 
tween star-forming galaxies and AGNs, the dashed curves show 
the uncertainty region. 



els: the [Oiii]A5007/Hj3 vs. [Nii]A6584/Ha diagnostic dia- 
gram is the only one where composites and LINERs clearly 
separate from star-forming galaxies thanks to the so-called 
"seagull wings". The same applies to LINERs and compos- 
ites which are not clearly separated in the [Olll] A5007/H/3 vs. 
rSii1A 67l7-H673l/Ha diagnostic diagram. UnUke lKewley et alJ 
(1200 6'). we did not classify as "ambiguous" the composites 
which fall in the LINERs region, since this latter diagram is not 
accurate in separating composites and LINERs. 

2.2. The blue classification 

We now derive the blue classification and sort the emission-line 
galaxi es into one of the four classes defined bv ILamareille et all 
( 1200 4') with the following result : 83 654 secure star-forming 
galaxies, 699 secure AGNs, 3 670 candidate star-forming galax- 
ies, and 1 356 candidate AGNs. 

Figure [3] shows where the galaxies classified with the red 
classification fall in the blue diagra m. The weaknesses o f the 
blue classification as compared to the iKewley et al.l (l2006l) clas- 
sification scheme applied on the same data are evident in Fig.|3] 
thanks to the high quality of SDSS's line measurement software, 
and to our signal-to-noise cut. The blue classification is clearly 
strongly biased against LINERs, which are classified for the ma- 
jority of them as star-forming galaxies or candidate star-forming 
galaxies in the blue diagram. The uncertainty region is actually 
dominated by AGNs (83%), while a non-negligible number of 
Seyfert 2 and LINERs (38%) are misclassified as "secure" star- 



forming ga laxies. Composit e s wer e classified as star-forming 
galaxies by L amareille et al.l ( |2004|) on 2dFRGS data. This has 
lead them to choose an empirical separation that goes more to 
the right of the blue diagram than necessary. 

Nevertheless , the empirical separation defined by 
ILamareille et al.1 (|2004 from 2dFGRS data (the solid curve) 
does not follow the actual boundary between star-forming 
galaxies and AGNs, as seen with SDSS DR7 data. This 
separation may then be improved. We may also define a 
new uncertainty region, and a separation between Seyfert 2 
and LINERs. The composites cannot be distinguished from 
star-forming galaxies, since they fall in the same region of the 
blue diagram. As mentioned above, this trend is also present 
in the log([Oiii]A5007/Hj3) vs. log([Sii]AA6717H-6731/Ha) 
diagnostic diagram (see Fig.|2|right). Unfortunately, it cannot be 
avoided at high redshift, without the [Nll]A6584 emission-line 
measurement. 

The red classification is not sensitive to reddening since 
it uses ratios of emission lines which are close in wave- 
length. Conversely, the blue classification uses a line ratio - 
[Oii]AA3726h-3729/H)3 - involving two lines which are not 
close in wavelength. Using equivalent widths instead of fluxes, 
as in this paper, removes direct dependence on reddening. Still, 
the reddening does not impact exactly in the same way the 
flux of emission lines and the flux of the underlying stel- 
lar continuum. There is thus an indirect dependence of the 
[Oii]AA3726h-3729/H)3 line ratio with reddening when calcu- 
lated with equivalent widths. Anyway this dependence is greatly 
reduced as compared to ratios of line fluxes. 



3. The improved classification 

We now define the new improved blue classification of emission- 
line galaxies. Figure|4|shows how the objects of different classes, 
according to the red classification, fall in the new blue diagram. 



3.1. The new star-forming - AGN boundary 

We define a new boundary that follows more precisely 
the st ar-forming galax ies region, a s did iKauffmann et al.l 
( 120031) compared to the iKewlev et all ( 1200 ll) boundary in flie 
log([Oiii]A5007/Hj3) vs. log([Nii]A6584/Ha) diagnostic dia- 
gram. According to the old blue classification, 87 324 galaxies 
were classified as secure or candidate star-forming galaxies. But 
19% of them are actually not star-forming galaxies according to 
the red classification. We can reduce this contamination by ex- 
cluding almost all LINERs and most of the Seyfert 2 with a more 
conservative separation. 

The equation that minimizes the contamination is : 



log([Oiii]/Hj3) = 



0.11 



log([Oii]/Hj3)-0.92 



-0.85. 



(1) 



It corresponds to the solid curve in Fig. |4| Star-forming galax- 
ies are below this curve, AGNs are above. The contamination is 
minimized to 16%. The minimization was done by eye, keeping 
in mind to maximize at the same time the success rate. Figure |3] 
that a majority of composites and a number of Seyfert 2 can- 
not be excluded from the region of star-forming galaxies, which 
explains why the contamination cannot be zero. We check that 
99.7% of the star-forming galaxies, according to the red classi- 
fication, are correctly classified with the new blue classification, 
which is quite satisfactory. 
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3.2. The mixed regions 

Even if almost all star-forming galaxies can be correctly classi- 
fied using the blue diagram, we know that all star-forming galax- 
ies classified according to the blue classification are not actual 
star-forming galaxies. As shown in the right panel of Fig. |4] 
a non-negligible number of Seyfert 2 galaxies fall into the re- 
gion of star-forming galaxies. From this plot, we easily define 
the boundary of the region where star-forming galaxies become 
mixed with Seyfert 2: 



log([Oiii]/H/3) >0.3. 



(2) 



This is the horizontal fine in Fig.|4] We call SF/Sy2 all the galax- 
ies above this line. Counting the region of AGNs (as defined by 
Eq. [B and the region of SF/Sy2, 86% of actual AGNs are cor- 
rectly classified with our new blue classification (59% as Seyfert 
2, 26% as SF/Sy2). The region of SF/Sy2 is nevertheless dom- 
inated by star-forming galaxies (74%). The left panel of Fig. |4] 
shows that, unlike Seyfert 2, LINERs do not significantly get 
mixed with star-forming galaxies: 91% of them are correctly 
classified without the need to define a SF/LINER region. 

We need also to consider the case of composites which fall 
in the region of star-forming galaxies and LINERs in the blue 
diagram: 85% of the composites are classified as star-forming 
galaxies, and 16% as LINERs in our new classification. We see 
from the right panel of Fig.|4]that almost all the composites fall 
by chance below the line defined in Eq. |2l Thus this line can be 
used to define the region where one should expect to find com- 
posites. However the majority of the composites fall in a much 
narrower region. We define this region with the two following 
inequalities: 



y < -(x- 1. 0)2 -0.1x + 0.25 
y > (x- 0.2)2 -0.6 



(3) 



where y = log([Oiii]/H/3), and x = log([Oll]/H/3). We call SF- 
LIN/comp all the galaxies in this region, which straddles over 
the star-forming galaxies and the LINERs. 64% of the actual 
composites fall in our SF-LIN/comp region. The SF-LIN/comp 
region is composed by 79% of star-forming galaxies, 19% com- 
posites, and 2% LINERs. 

3.3. The new Seyfert 2 - LINER boundary 

We define an empirical boundary that allows one to distinguish 
Seyfert 2 from LINERs in the AGNs region of the blue diagram. 
It is shown as a solid diagonal line in Fig. |4] and follows the 
equation below: 



log([Oiii]/Hj3) = 0.95 X log([Oii]/Hj3) - 0.4. 



(4) 



This separation minimizes the number of misclassifications be- 
tween these two classes. Only 8% of the Seyfert 2 and 4% of the 
LINERs, according to the red classification, are misclassified re- 
spectively as LINERs or Seyfert 2, according to the new blue 
classification. 



4. An example application 

Our new classification will be useful for building samples of star- 
forming galaxies at high redshift. As an example, we show in 
Fig. |5] updated results obtained for VVDS data with our new 
classification. We refer the reader to Lamareille et al. (2009) 
for details. The left panel shows that we now obtain less star- 
forming galaxies than with the previous classification scheme. 



Table 1. This table gives the number of galaxies in each class 
of the reference classification (columns) as a function of each 
class of the new blue classification (lines). Please note that the 
SF/Sy2 and SF-LIN/comp are already counted in one of the three 
main classes (i.e. star-forming galaxies, Seyfert 2 or LINERs) 
which may be summed to get the total number of objects. We use 
the following abbreviations: SFG: star-forming galaxies; Sey. 2: 
Seyfert 2. 





total 


SFG 


Sey. 2 


LINER 


comp. 


total 


89 379 


67 778 


2949 


4912 


13 740 


SFG 


80312 


67 539 


952 


270 


11551 


Sey. 2 


2016 


55 


1748 


187 


26 


LINER 


7 051 


184 


249 


4455 


2 163 


SF/Sy2 


3 603 


2 668 


774 


6 


155 


SF-LIN/comp 


47461 


37 669 


17 


988 


8 787 



In the right panel, this ends up to slightly different estimates of 
the mass-metallicity relation in the 0.5 < z < 0.6 and 0.6 < z < 
0.8 redshift ranges. With our improved classi fication, we now 
confirm and even strengthen the conclusion of La mareille et al.l 
(2009) that the metallicity evolution of star-forming galaxies is 
less significant as a function of redshift for lower mass galaxies 
than for high mass galaxies. 

For galaxies of masses ~ lO^M©, the metallicity evolution is 
respectively 0.06dex and O.lOdex lower in the two above men- 
tioned redshift ranges than the results obtained with the old clas- 
sification of star-forming galaxies. The difference between the 
old and the new classification is not significant for galaxies of 
masses ^ IO^'^-^Mq. 



5. Conclusion 

Table [T] summarizes the distribution of the objects in the new 
blue classification, as compared to the reference red classifi- 
cation. A large majority of star-forming galaxies are correctly 
classified with the new blue classification. But the new blue 
classification also suffer from a non negligible contamination of 
the star-forming regions by composites, which should be taken 
into account in studies of star-forming galaxies. We note how- 
ev er that a number o f composites, according to the classification 
of 'Kauffmann et al.' ("2003), may actually only be star-forming 
galaxies . Kewley et al .( 200 1 ) have shown from theoretical mod- 
eling that pure star-forming spectra can be ex pected in the re- 
gion la t ter defined as the comp osite region by Kauffmann et al.l 
(l2003h . IStasihska et alJ (l2006h have also shown that the com- 
posites allow an AGN contribution up to 20%, but this does not 
mean that this contribution cannot be lower than 20%, or even be 
zero. True composites may be confirmed only from far infrared 
or X-ray observations. 

We define the region of SF-LIN/comp which contains the 
majority of actual composites, but is dominated by actual star- 
forming galaxies and LINERs. The composites also contami- 
nates the region of LINERs in our new classification. However, 
most of the actual LINERs are correctly classified with our new 
classification. 

Finally, the region of Seyfert 2 in our new classification is 
almost only composed of actual Seyfert 2 with no significant 
contamination. But about a third of the actual Seyfert 2 are 
just classified as SF/Sy2 in our new classification, this class is 
unfortunately dominated by act ual star-forming galaxies. The 
DEW classification proposed bv lStasinska et al. ( l2006h actually 
complements our classification in classifying correctly SF/Sy2 
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Figure 4. This is the new improved "blue" classification of emission-line galaxies. The two diagnostic diagrams show the relation 
between two line ratios: log([Olll]A5007/H/3) vs. log([Oll]AA3726H-3729/Hj3). According to the red classification (see Fig.©, 
star-forming galaxies are shown in blue, LINERs in cyan, composites in magenta, and Seyfert 2 in green. For clarity, the two first 
classes are shown only in the left panel, while the two last classes are shown only in the right panel. The red curves show the 
new empirical separations defined in the text: between star-forming galaxies and AGNs (Eq. [T]l, between Seyfert 2 and LINERs 
(Eq.m, between star-forming galaxies and SF/Sy2 (Eq.|2]i. The black dashed curves delimits the region where lies the majority of 
composites (SF-LIN/comp region, Eq.[3]l. 




log([0II]3727/H/S) (EW) log(MVMe) 

Figures. Left: Spectral classification of VVDS galaxies (iLamareille et alJ [20091) . According to our new classification given in 
Eq.[T] we classify these objects as star-forming galaxies (blue triangles), and AGNs (green squares). The red curves shows the older 
classification scheme by Lamareil le et al. (2004, see also Fig.O Right: Relation between the logarithm of the stellar mass of the 
gas-phase oxygen abundance of st ar-forming galaxies in the VVDS sample. Dashed lines and open symbols show the old results 
as given in ILamareille et al] (l2009l) . Solid lines and filled symbols show the new results with our new classification. The results are 
given in the 0.5 < z < 0.6 (dark gray) and 0.6 < z < 0.8 (light gray) redshift ranges. 
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galaxies. This will be commented in further details in the second 
paper of this series. Our new classification can be used to define 
samples of star-forming galaxies (see Sect. |4|i, but also samples 
of Seyfert 2 or LINERs (e.g. to compute luminosity functions) 
in a much more accurate way than the previous blue classifica- 
tion scheme. However it cannot be used to derive samples of 
composites since they get mixed with star-forming galaxies and 
LINERs. 
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